A Comparison of Sampling Techniquesfor Web Graph Characterization

نویسندگان

  • Luca Becchetti
  • Carlos Castillo
  • Debora Donato
  • Adriano Fazzone
چکیده

We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Community Base on Web Graph Clustering

Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...

متن کامل

designing and implementing a 3D indoor navigation web application

​During the recent years, the need arises for indoor navigation systems for guidance of a client in natural hazards and fire, due to the fact that human settlements have been complicating. This research paper aims to design and implement a visual indoor navigation web application. The designed system processes CityGML data model automatically and then, extracts semantic, topologic and geometric...

متن کامل

Decoding the structure of the WWW: facts versus sampling biases

The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been tackled recently by characterizing the properties of its representative graphs in which vertices and directed edges are identified with web-pages and hyperlinks, respectively. Data gathered in large scale crawls have been analyzed by s...

متن کامل

A model for specification, composition and verification of access control policies and its application to web services

Despite significant advances in the access control domain, requirements of new computational environments like web services still raise new challenges. Lack of appropriate method for specification of access control policies (ACPs), composition, verification and analysis of them have all made the access control in the composition of web services a complicated problem. In this paper, a new indepe...

متن کامل

Sampling from social networks’s graph based on topological properties and bee colony algorithm

In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006